No d ’ ordre : 2009 - ISAL - 0036 Année 2009

نویسندگان

  • Michael R. Berthold
  • Maguelonne Teisseire
  • Ruggero G. Pensa
چکیده

An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N o d ’ ordre : 4015 ANNÉE 2009

Web Services orchestrations require a firm mathematical basis for their development. We start from the Orc formalism proposed by J. Misra and co-workers, at Austin University. Orc is small and elegant and captures the essence of orchestrations. We translate Orc into colored Petri net systems, a generalization of Petri nets allowing to handle recursion—this formalism was recently proposed by Dev...

متن کامل

No d ’ ordre : 2012 - ISAL - 0094 Année 2012 THÈSE

Pattern discovery in large binary relations has been extensively studied. Typically, it needs to compute patterns that hold in relations Objects×Properties that denote whether given properties are satisfied or not by given objects. An emblematic success in this area concerns frequent itemset mining and its post-processing that derives association rules. It is however clear that many datasets co...

متن کامل

XWH - 05 - 1 - 0036 TITLE : The Effect of Glycolytic Modulation in Prostate Cancer PRINCIPAL

No:16087 Citation:J Clin Oncol 26: 2008 (May 20 suppl; abstr 16087) Author(s):R. S. DiPaola, M. N. Stein, S. Goodin, S. Eddy, E. H. Rubin, S. Doyle-Lindrud, D. Dvorzhinski, S.Beers, W. J. Shih, E. White

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009